Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) design. Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers. Although concatenation is parameters- and FLOPs-free, its computational cost on hardware devices is non-negligible. To address this, this paper provides a new perspective to realize feature reuse via structural re-parameterization technique. A novel hardware-efficient RepGhost module is proposed for implicit feature reuse via re-parameterization, instead of using concatenation operator. Based on the RepGhost module, we develop our efficient RepGhost bottleneck and RepGhostNet. Experiments on ImageNet and COCO benchmarks demonstrate that the proposed RepGhostNet is much more effective and efficient than GhostNet and MobileNetV3 on mobile devices. Specially, our RepGhostNet surpasses GhostNet 0.5x by 2.5% Top-1 accuracy on ImageNet dataset with less parameters and comparable latency on an ARM-based mobile phone.
translated by 谷歌翻译
沿着整个空间尺寸聚集的全局空间统计数据广泛用于顶级性能图像恢复器。例如,在挤压和激发(SE)中采用的实例归一化(IN)中采用的实例归一化(IN)的平均值,方差,其被应用于MPRNet。本文首先显示在训练/测试阶段的基于补丁/全部图像的特征上聚合的统计分别可以分发非常不同,并导致图像恢复器中的性能下降。它已被以前的作品被广泛忽视。要解决此问题,我们提出了一种简单的方法,测试时将局部统计转换器(TLSC)替换为仅在测试时间中从全局到本地的统计聚合操作区域。如果没有再培训或芬降,我们的方法显着提高了图像恢复器的性能。特别是,通过将TLSC扩展到最先进的模型,MPRNET升压在GoPro数据集上的PSNR中的0.65 dB,实现了33.31dB,超过了先前的最佳结果0.6 dB。此外,我们只需将TLSC应用于高级视觉任务,即语义细分,并实现竞争结果。进行了广泛的数量和质量实验,以证明TLSC解决了边际成本的问题,同时显着获得。该代码可在https://github.com/megvii-research/tlsc中获得。
translated by 谷歌翻译
There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
知识图的归纳链路预测旨在预测未见实体之间的缺失联系,而那些未在训练阶段显示的实体。大多数以前的作品都学习实体的特定实体嵌入,这些实体无法处理看不见的实体。最近的几种方法利用封闭子图来获得归纳能力。但是,所有这些作品仅在没有完整的邻近关系的情况下考虑子图的封闭部分,这导致了忽略部分邻近关系的问题,并且很难处理稀疏的子图。为了解决这个问题,我们提出了SNRI子图邻近关系Infomax,它足够从两个方面利用完整的相邻关系:节点特征的相邻关系特征和稀疏子图的相邻关系路径。为了进一步以全球方式建模邻近关系,我们对知识图进行创新的相互信息(MI)最大化。实验表明,SNRI在归纳链路预测任务上的大幅度优于现有的最新方法,并验证以全局方式探索完整的邻近关系的有效性,以表征节点特征和在稀疏子分类上的理由。
translated by 谷歌翻译
强化学习算法在竞争挑战板和视频游戏时表现良好。越来越多的研究工作侧重于提高加强学习算法的泛化能力。普通视频游戏AI学习竞赛旨在设计能够学习在培训期间出现不同游戏水平的代理商。本文总结了五年的一般视频游戏AI学习竞争。在每个版本,设计了三场新游戏。对于每场比赛,通过扰动或组合两个训练水平来产生三个测试水平。然后,我们提出了一种新颖的加强学习框架,对一般视频游戏的双程观察,在假设中,它更有可能在不同级别而不是全局信息中观察到类似的本地信息。因此,我们所提出的框架而不是直接输入基于目前游戏屏幕的单个原始像素的屏幕截图,而是将游戏屏幕的编码,转换的全局和本地观测视为两个同时输入,旨在学习播放新级别的本地信息。我们提出的框架是用三种最先进的加强学习算法实施,并在2020年普通视频游戏AI学习竞赛的游戏集上进行了测试。消融研究表明,使用编码,转换的全局和本地观察的出色性能。总体上最好的代理商进一步用作2021次竞赛版的基线。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译